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ABSTRACT The Middle East respiratory syndrome coronavirus (MERS-CoV) causes a severe acute respiratory tract infection 
with a high fatality rate in humans. Coronaviruses are capable of infecting multiple species and can evolve rapidly through re- 
combination events. Here, we report the complete genomic sequence analysis of a MERS-CoV strain imported to China from 
South Korea. The imported virus, provisionally named ChinaGD01, belongs to group 3 in clade B in the whole-genome phyloge- 
netic tree and also has a similar tree topology structure in the open reading frame 1a and -b (ORFlab) gene segment but clusters 
with group 5 of clade B in the tree constructed using the S gene. Genetic recombination analysis and lineage-specific single- 
nucleotide polymorphism (SNP) comparison suggest that the imported virus is a recombinant comprising group 3 and group 5 
elements. The time-resolved phylogenetic estimation indicates that the recombination event likely occurred in the second half of 
2014. Genetic recombination events between group 3 and group 5 of clade B may have implications for the transmissibility of the 
virus. 


IMPORTANCE The recent outbreak of MERS-CoV in South Korea has attracted global media attention due to the speed of spread 
and onward transmission. Here, we present the complete genome of the first imported MERS-CoV case in China and demon- 
strate genetic recombination events between group 3 and group 5 of clade B that may have implications for the transmissibility 


of MERS-CoV. 
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liddle East respiratory syndrome coronavirus (MERS- 
IV icov), first detected in the Kingdom of Saudi Arabia 
(KSA) in 2012, causes severe acute respiratory tract infection in 
humans, with a high case fatality rate (CFR) (1-4). Dromedary 
camels are believed to be important reservoir hosts or vectors 
for human infection; bats may also be implicated (5-8). As of 
17 July 2015, 1,368 laboratory-confirmed cases of human in- 
fection with MERS-CoV had been reported to the World 
Health Organization (WHO), including at least 490 deaths, 
corresponding to a CFR as high as 35.45% (9). Recent MERS 
clusters in South Korea are thought to be the largest outbreak 
outside the Middle East countries (10). As of 25 July 2015, 186 
laboratory-confirmed cases of MERS-CoV infection have been 
confirmed (including 36 deaths) in South Korea (9). A South 
Korean man who was a relative of some of the laboratory- 
confirmed cases traveled to Guangdong Province (10) and was 
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diagnosed as the first imported MERS-CoV case in China by 
molecular detection of MERS-CoV (11, 12). The rapid spread 
of disease in South Korea raised concerns that the imported 
virus had evolved to become more transmissible. Here, we re- 
port a comprehensive phylogenetic analysis of the complete 
MERS-CoV genome sequence of the first Chinese imported 
case of MERS (ChinaGD01), and the results indicate its prob- 
able origin and show evidence of genetic recombination. 


RESULTS 

Patient and sample history. The current outbreak in South Korea 
and China was initiated when a 68-year-old Korean man flew back 
to Seoul on 4 May 2015 after a visit to four Middle East countries 
(Bahrain, United Arab Emirates, Saudi Arabia, and Qatar). On 
26 May 2015, a 44-year-old South Korean man presented with 
fever to a hospital in Guangdong. He was in close contact with the 
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FIG1 Timeline of the travel history, potential virus exposure, onset of disease, and diagnosis of the first imported MERS-CoV case in China. UAE, United Arab 
Emirates; KSA, Kingdom of Saudi Arabia. 
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FIG2 Phylogenetic relationships based on complete genomes (A), ORF lab genes (B), and S genes (C) of MERS-CoV strains. China’s first imported MERS-CoV strain 
(GenBank accession no. KT006149.2), South Korea’s first MERS-CoV strain (GenBank accession no. KT029139), and the latest MERS-CoV strains prevalent in the 
Middle East (GenBank accession no. KT026453 to KT026456) are indicated in red. The MERS-CoV strains derived from camels are indicated in blue. All of the complete 
genomes were analyzed by nucleotide sequence alignment using the maximum-likelihood method implemented in the RAxML. Numbers at the nodes indicate 
bootstrap support for each node (percentage of 1,000 bootstrap replicates). Scale bars indicate the expected number of nucleotide substitutions per site. 
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index patient in South Korea on 16 May 2015 (Fig. 1), as well asa 
suspected second-generation patient. The timeline of the travel 
history, potential virus exposure, onset of disease, and diagnosis of 
the first imported MERS-CoV case in China are presented in 
Fig. 1. 

Characterization of genome. With informed consent and the 
approval of the ethical committee of the National Institute of Viral 
Disease Control and Prevention, China Center for Disease Con- 
trol and Prevention (CDC), nasopharyngeal swabs were collected 
and used for RNA extraction, followed by reverse transcription 
PCR and genome sequencing. Through both Sanger and Jon 
Torrent sequencing, the full-length virus genome (30,144 bp) of 
ChinaGD01 was obtained and deposited in GenBank (accession 
no. KT006149). Over 2,000,000 paired-end reads were quality 
trimmed and processed to remove human genome sequences. 
Nonhuman reads were assembled into contigs by CLC Genomic 
Workbench and aligned against representative sequences of 
MERS-CoV. No nucleotide insertions or deletions were observed 
in the genome. 

The genome sequence of this virus, referred to as ChinaGD01, 
had high levels of nucleotide identity (99.33% to 99.79%) to pre- 
viously published MERS-CoV genomes (Fig. 2), with 99.31% to 
99.78% sequence identity in the open reading frame 1a and -b 
(ORFlab) gene segment and 98.91% to 99.60% identity in the S 
gene. The E, M, and N genes had 98.93% to 100% identity with 
previously described MERS-CoV strains. In total, ChinaGD01 
possessed 11 nonsynonymous nucleotide substitutions (Table 1), 
which occurred in the ORFlab (n = 8), ORF3 (n = 1), ORF4b 
(n = 1), and M (n = 1) genes, respectively (Table 1). Although 
there were five nucleotide substitutions in the S gene, no amino 
acid change was discovered. Of note, in comparison with previ- 
ously published MERS-CoV genomes, the ChinaGD01 genome 
shows 11 unique amino acid substitutions, and 8 of them were 
shared with the newly released South Korean strains and the latest 
strains prevalent in Saudi Arabia (Table 1). 

Phylogenetic analysis. To further investigate the genetic re- 
lationship between ChinaGD01 and other MERS-CoV strains 
whose genomes are available, we performed phylogenetic anal- 
yses using the complete genome, the ORFlab gene, and the S 
gene. From the whole-genome phylogeny, all available MERS- 
CoV strains can be clustered into two clades, the earlier clade A 
and the more recent clade B (Fig. 2A). ChinaGD01 fell into 
group 3 of clade B (Fig. 2A). Within group 3, ChinaGDO1 and 
the South Korean and Saudi Arabian strains from 2015 were 
closely clustered and formed a long branch, separate from oth- 
ers of group 3. The nearest strain to this branch was Hafr-Al- 
Batin-1-2013 (GenBank accession no. KF600628), isolated in 
August 2013. Phylogenetic analysis of the ORF lab gene indicated 
a similar topology in which ChinaGDO1 and the recent MERS- 
CoV strains identified in South Korea were closely adjacent to 
Hafr-Al-Batin-1-2013 in group 3 (Fig. 2B). However, the phylog- 
eny of the S gene differed in that the new viruses fell into group 5 
and were closely related to viruses from both humans and drom- 
edaries (Fig. 2C). These findings are consistent with recombina- 
tion, a phenomenon not uncommon in coronaviruses. 

Genetic recombination analysis. To examine whether ge- 
netic recombination has occurred in ChinaGD0O1, we per- 
formed bootscanning analyses. We compared ChinaGD01 with 
representative viruses from group 3 (Hafr-Al-Batin-1-2013; 
GenBank accession no. KF600628), group 5 (KSA-CAMEL-378; 
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TABLE 1 Comparison of sites of variation between gene sequences of 
ChinaGD01, the first South Korean strain, the latest Saudi Arabia 
strains, and other MERS-CoV strains 


South Saudi 
ae China Korea Arabia Others 
Position =. ___ 
Gene (bp) Nt Aa Nt Aa Nt Aa Nt Aa 
ORFlab 472 G Val G Val G Val T Phe 
2496 C Phe C Phe C Phe T Phe 
2930 G Gly G Gly A_- Asp A_ Asp 
3706 A Thr A Thr A Thr G Ala 
5895 G Ser Cc Ser Cc Ser I Ser 
6357 T lle T lle T Tle G Met 
6876 GC Phe C Phe T Phe T Phe 
7277 Tr lle T lle T Tle G Thr 
9678 T Ser T Ser Cc Ser Cc Ser 
10716 T Val =T Val G Val oC Val 
11649 C Asp C Asp T Asp T Asp 
17868 A lle A lle A Tle G Val 
19739 T lle T lle T Tle G Met 
20685 A Ser A Ser G Gly G_ Gly 
S 258 iC Val oC Val G Val =T Val 
1848 C Val =T Val TL Val =T Val 
2841 Cc Tyr C Tyr C Tyr T Tyr 
3177 C Asp C Asp T Asp T Asp 
3267 T Ala T Ala G Ala T Ala 
ORF3 49 T Phe T Phe T Phe C Leu 
183 T Asp T Asp C Asp C Asp 
237 T Ser T Ser T Ser A Ser 
ORF4a 258 Cc Asp T Asp C Asp T Asp 
ORF4b 17 C Thr T Met C Thr T Met 
ORF5 228 I Leu T Leu T Leu G Leu 
M 367 A lle A Tle A Tle T Phe 
438 T Gly T Gly T Gly C Gly 


The positions of amino acid substitutions are indicated by boldface. China, imported 
MERS-CoV strain ChinaGD01 (GenBank accession no. KT006149); South Korea, 
South Korea’s first MERS-CoV strain (GenBank accession no. KT029139); Saudi 
Arabia, MERS-CoV strains recently identified in Saudi Arabia (GenBank accession no. 
KT026453 to KT026456); Others, other MERS-CoV strains detected worldwide; Nt, 
nucleotide; Aa, amino acid. 


GenBank accession no. KJ713296), and group 1 (Abu Dhabi_ 
UAE_9_ 2013; GenBank accession no. KP209312) as controls. As 
shown in Fig. 3A, ChinaGD01 was more similar to the group 3 
strain from position 1 to 15,000 and more similar to the group 5 
strain from approximately position 18,000 to 24,000. We then 
compared the single-nucleotide polymorphisms (SNPs) of 
ChinaGD01 with consensus sequences of group 3 and group 5 
(Fig. 3B; see also Fig. S1 and S2 in the supplemental material). 
There were 78 SNPs discovered along the ChinaGD01 genome 
(Fig. 3B). Whereas before position 17,206, ChinaGD01’s SNP pat- 
tern is nearly identical to that of the group 3 viruses, its SNP 
pattern is more similar to that of group 5 viruses between posi- 
tions 17,311 and 23,804. The consistency in the results of 
bootscanning and SNP analyses supports the hypothesis that the 
gene segment from approximately position 17,300 to 24,000, rep- 
resenting portions of the ORFlab and S genes, reflects a recombi- 
nation event (Fig. 3B). 

Phylogenetic analysis was further performed using BEAST 
with the complete genome, the nonrecombinant region (positions 
1 to 17,300), and the potential recombinant region (positions 
17,301 to 24,000), respectively (Fig. 4). The phylogenies revealed 
by the BEAST trees were consistent with those from the 
maximum-likelihood trees. In the trees constructed using the 
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FIG3 Recombination analyses of complete MERS-CoV genomes. (A) Bootscanning analysis of MERS-CoV genome. The ChinaGD01 strain was used as a query 
sequence and compared with one strain from group 3 (GenBank accession no. KF600628.1), one from group 5 (GenBank accession no. KJ713296.1), and one 
from group 1 (GenBank accession no. KP209312.1). (B) Single-nucleotide differences between the ChinaGD01 sequence and consensus sequences of group 3 and 
group 5. Group 3 cons, consensus sequences of group 3 strains; group 5 cons, consensus sequences of group 5 strains; South Korea, first MERS-CoV strain 
(GenBank accession no. KT029139) in South Korea; KSA-2015, latest strains prevalent in Saudi Arabia (GenBank accession no. KT026453 to KT026456), 


Bisha-1/2012, an earlier strain used as a control. 


complete genome and the nonrecombinant region, ChinaGD01 
fell within group 3; however, trees constructed using the recom- 
binant region clustered with the group 5 sequences. 

To date the recombination event, we estimated the time to 
most recent common ancestor for the novel MERS-CoV from 
2015. Although there was a slight difference among results 
from different models, the time to most recent common ances- 
tor of the 2015 cluster was estimated to be between 0.5 and 
0.7 years before the identification of the imported case in the 
latter months of 2014 (Table 2). Given the observation of sim- 
ilar recombination events in the newly released South Korean 
strains and the latest strains prevalent in Saudi Arabia, the 
travel histories of patients, and potential opportunities for vi- 
rus exposure, we surmise that the recombination likely oc- 
curred in the Arabian Peninsula. 


DISCUSSION 


Over the past 3 years, MERS-CoV infections have continued to 
increase, posing a serious threat to global public health. Previous 
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studies have revealed that MERS-CoV infections are likely due to 
repeated introductions of MERS-CoV from dromedary camels to 
humans (13-15), resulting in only limited human-to-human 
transmission (16). However, the large number of second- and 
third-generation cases in South Korea raised concerns that MERS- 
CoV may have evolved to become more adapted to human-to- 
human transmission. 

Our results indicate that at the whole-genome level, 
ChinaGD01 is >99% similar to the previously identified MERS- 
CoV strains. Phylogenetic analysis based on the whole-genome 
sequence revealed that it belongs to group 3 of clade BMERS-CoV 
strains and forms a separate small branch with viruses from South 
Korea and Saudi Arabia from 2015. Different phylogenies were 
observed in the trees constructed using the full-length genome 
and the S gene, indicating the possibility of a recombination event. 
Further evidence of a recombination event was obtained through 
bootscanning and SNP analyses. BEAST analysis revealed that it 
might have occurred recently, in the second half of 2014, in the 
Middle East. 
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FIG 4 Time-resolved phylogenetic analyses of complete genomes (A), nonrecombination regions (B), and recombination regions (C) of MERS-CoV strains 
using BEAST. The nonrecombination region is approximately bp 1 to 17,300, and the recombination region is approximately bp 17,301 to 24,000. ChinaGD0O1 
(GenBank accession no. KT006149), South Korea’s first MERS-CoV strain (GenBank accession no. KT029139), and the latest strains prevalent in Saudi Arabia 


(GenBank accession no. KT026453 to KT026456) are indicated in red. 


Genetic recombination has been well established in severe 
acute respiratory syndrome coronavirus (SARS-CoV) (17, 18); 
however, there is only one report of genetic recombination in 
MERS-CoV (19). Dudas and Rambaut point to frequent re- 
combination in MERS-CoV and partition the genome into two 
parts in which nucleotides 1 to 23,722 and nucleotides 23,723 
to 30,126 have independent molecular clock rates. Based on the 
latest genome sequences from South Korea and the Kingdom of 
Saudi Arabia, our research indicated that a novel type of genetic 
recombination has occurred in the MERS-CoV strains preva- 
lent in South Korea. We note that six MERS-CoV isolates from 
2015 (ChinaGD01, the first MERS-CoV strain from South Ko- 
rea, and the four latest strains from Saudi Arabia) had high 
levels of nucleotide identity (99.90% to 99.96%) and showed 
the same recombination signal in our analyses. We speculate 
that they arose from a common recombination event. How- 
ever, more studies are needed to understand the relationship 
between genetic recombination of MERS-CoV, the biological 
properties it conveys, and its relevance to the recent high rate of 
transmission. 


MATERIALS AND METHODS 


Full-length genomic sequencing. Nasopharyngeal swabs from the South 
Korean patient diagnosed with MERS-CoV infection were collected and 


used for viral RNA extraction with the QlAamp viral RNA minikit. Forty- 
four sets of specific primer pairs were designed and used to amplify the 
complete genome, followed by Sanger sequencing; meanwhile, the ex- 
tracted viral RNA was also used for next-generation sequencing with the 
Ion Torrent PGM after random amplification. 

Phylogenetic analysis. We downloaded all (n = 92) available full- 
length genome sequences of MERS-CoV from GenBank and used RAxML 
(20) for phylogenetic analyses of the complete genome, the ORF lab gene, 
and the S gene, respectively. One thousand bootstrap replicates were run. 
Furthermore, the Bayesian Markov chain Monte Carlo method, imple- 
mented in BEAST (21), was used to estimate the time to the most recent 
common ancestor. Twelve different model combinations were applied. 
For all the analyses, we used the general time-reversible nucleotide sub- 
stitution model with gamma-distributed rate heterogeneity. Bayesian 
Markov chain Monte Carlo analysis was run for 50 million steps. Trees 
and parameters were sampled every 5,000 steps, with the first 10% re- 
moved as burn-in. 

Genetic recombinant analysis. Similarity plots and bootscan- 
ning analysis were generated by SimPlot (22); a sliding window of 200 
nucleotides was used, moving in 20-nucleotide steps. Single- 
nucleotide-difference analysis was used to confirm the recombination 
event. 

Nucleotide sequence accession number. The full-length virus ge- 
nome (30,144 bp) of ChinaGD01 was deposited in GenBank under acces- 
sion no. KT006149. 


TABLE 2 Estimated times in years to most recent common ancestor of the 2015 cluster 


Value for indicated coalescent model [mean (95% CI)] 


Constant size 


0.6594 (0.4259, 0.9137) 
0.6008 (0.3586, 0.8944) 
0.6394 (0.4049, 0.9247) 


Molecular clock model 


Strict clock 
Exponential relaxed clock 
Lognormal relaxed clock 


4 GMRE, Gauss Markov random fields. 
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Exponential growth 


0.6046 (0.371, 0.8562) 
0.5208 (0.3339, 0.7868) 
0.582 (0.3684, 0.8437) 


Logistic growth 
0.6621 (0.4167, 0.9201) 


0.6037 (0.3578, 0.9059) 
0.641 (0.4085, 0.9167) 


GMRF Bayesian skyride* 


0.6007 (0.3611, 0.8396) 
0.5199 (0.3141, 0.8026) 
0.5711 (0.3351, 0.8264) 
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